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Abstract 

Objective To investigate the effect of an additional review based on 
reporting guidelines such as STROBE and CONSORT on quality of 
manuscripts. 

Design Masked randomised trial. 

Population Original research manuscripts submitted to the Medicina 
Clinica journal from May 2008 to April 2009 and considered suitable for 
publication. 

Intervention Control group: conventional peer reviews alone. Intervention 
group: conventional review plus an additional review looking for missing 
items from reporting guidelines. 

Outcomes Manuscript quality, assessed with a 5 point Likert scale 
(primary: overall quality; secondary: average quality of specific items in 
paper). Main analysis compared groups as allocated, after adjustment 
for baseline factors (analysis of covariance); sensitivity analysis compared 



groups as reviewed. Adherence to reviewer suggestions assessed with 
Likert scale. 

Results Of 126 consecutive papers receiving conventional review, 34 
were not suitable for publication. The remaining 92 papers were allocated 
to receive conventional reviews alone (n=41 ) or additional reviews (n=51 ). 
Four papers assigned to the conventional review group deviated from 
protocol; they received an additional review based on reporting 
guidelines. We saw an improvement in manuscript quality in favour of 
the additional review group (comparison as allocated, 0.25, 95% 
confidence interval -0.05 to 0.54; as reviewed, 0.33, 0.03 to 0.63). More 
papers with additional reviews than with conventional reviews alone 
improved from baseline (22 (43%) weight (20%), difference 23.6% (3.2% 
to 44.0%), number needed to treat 4.2 (from 2.3 to 31 .2), relative risk 
2.21 (1 .10 to 4.44)). Authors in the additional review group adhered more 
to suggestions from conventional reviews than to those from additional 
reviews (average increase 0.43 Likert points (0.1 9 to 0.67)). 
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Conclusions Additional reviews based on reporting guidelines improve 
manuscript quality, although the observed effect was smaller than 
hypothesised and not definitively demonstrated. Authors adhere more 
to suggestions from conventional reviews than to those from additional 
reviews, showing difficulties in adhering to high methodological standards 
at the latest research phases. To boost paper quality and impact, authors 
should be aware of future requirements of reporting guidelines at the 
very beginning of their study. 

Trial registration and protocol Although registries do not include trials 
of peer review, the protocol design was submitted to sponsored research 
projects (Institute de Salud Carlos III, PI081903). 

Introduction 

The scientific value of biomedical journals relies on the peer 
review process and on editorial decisions, but the quality of 
these processes is far from guaranteed. 1 4 Processes, be they in 
patient care or in peer review, can be improved through 
interventions, which are then evaluated by trials. In 1992, 
Drummond Rennie 5 called for scientific proof of the value of 
the peer review system. A Cochrane review updated in 2007 6 
concluded that little evidence supports the effectiveness of 
scientific peer review, since most studies tested the specific 
effects of masking either authors or reviewers. Investigators 
have also attempted to improve the quality of peer review, 7 
reduce reviewer burden, 8 lessen reviewer bias, 9 and improve the 
detection of fraud."' However, only two of these studies were 
randomised controlled trials, and all of them focused on 
surrogate variables related to the review process and not on the 
true outcome: manuscript quality. 

In recent years, the need to establish common, minimum 
standards of quality resulted in the development of reporting 
guidelines. These guidelines are defined as: "statements that 
provide advice on how to report research methods and findings 
. . . they specify a minimum set of items required for a clear and 
transparent account of what was done and what was found in a 
research study, reflecting in particular issues that might 
introduce bias into the research."" Specific guidelines have 
been developed for different kinds of medical investigation, 
such as those estimating intervention effects (CONSORT 
(consolidated standards of reporting trials), 12 TREND 
(transparent reporting of evaluations with non-randomised 
designs) 13 ), assessing causes and prognosis (STROBE 
(strengthening the reporting of observational studies in 
epidemiology) 14 ), quantifying accuracy of diagnosis and 
prognosis tools (STARD (standards for the reporting of 
diagnostic accuracy studies), 15 REMARK (reporting 
recommendations for tumour marker prognostic studies) 16 ), 
testing genetic associations (STREGA (strengthening the 
reporting of genetic associations) 17 ), and aggregating evidence 
(PRISMA (preferred reporting items for systematic reviews and 
meta-analyses) 18 ). Some reporting guidelines have been shown 
to improve the quality of reports, such as CONSORT. 19 20 

Our team at the Medicina CMnica journal previously undertook 
two randomised trials to determine whether adding a statistical 
reviewer had any effect on final manuscript quality; the first 
trial 21 suggested a positive benefit that was confirmed in the 
second. 22 This second trial also investigated the effect of 
suggesting the use of reporting guidelines to reviewers, but did 
not observe any benefit. 

The present study focuses on the merged effects of statistical 
reviews and reviewing guidelines. The intervention consisted 
of an additional review from a senior statistician asking authors 
to provide information about incomplete or missing items from 
reporting guidelines. This additional review was possible 



because the launch of the STROBE guideline in 2007 allowed 
us to systematically apply a reporting guideline to almost any 
paper submitted to Medicina CMnica. Thus, we aimed to quantify 
the effects of an additional review based on reporting guidelines 
on the quality of the final manuscript in a weekly medical 
journal with no specific requirements to follow reporting 
guidelines. After analysis of the overall results, we considered 
another hypothesis: when updating their manuscript, would 
authors adhere more to suggestions based on conventional 
reviews, rather than to those based on reporting guidelines? 

Methods 

Study design and population 

The study was a randomised trial and the web appendix provides 
full details of the trial protocol. Medicina Ch'nica is a weekly 
journal based in Barcelona, Spain, with an impact factor of 1 .4 
that receives more than 300 original papers each year and 
publishes about a third of submissions. The journal did not ask 
authors to adhere to reporting guidelines. The current editors 
(MV and CRJ) assumed their roles on the journal in January 
2000, and have since recruited AS, AU, VF, and EC. The two 
general secretaries (JMR and FC) have more than 15 years' 
experience. 

During the selection process, the first editorial decision chose 
which papers were sent to reviewers (fig ljj). The second 
decision selected which papers were returned to authors for 
improvement. We defined the study population as all original 
research manuscripts received by Medicina CMnica from 1 May 
2008 to 30 April 2009 that successfully passed through the 
second editorial decision after conventional peer review. 

Intervention 

All papers were reviewed by the usual referee team (usually 
two clinicians, or one clinician with either one statistician or 
one epidemiologist). Manuscripts in the intervention group also 
received an additional review based on reporting guidelines. A 
senior statistician (EC) did the additional reviews, and 
persistently provided suggestions on how to follow reporting 
guideline checklists. The manuscript study type determined the 
guideline used in the review: STROBE, CONSORT 2001, 23 
TREND, STARD, STREGA, and REMARK. The box shows 
an example of a review based on reporting guidelines. 

Although additional reviews were only sent to authors in the 
intervention group, these additional reviews were done for every 
paper accepted conditionally in the first editorial decision (fig 
1). This step had three purposes: to maintain manuscript flow 
throughout the editorial process; to keep the main investigator 
masked; and to obtain a score for the initial quality, which was 
needed for subsequent random allocation. 

The 92 papers eligible for randomisation received a mean of 
12.8 (standard deviation 9.6) reviewer suggestions per paper. 
Each paper receiving an additional review had 13.5 (3.9) 
suggestions, was 446 (146) words long, took 28.4 ( 1 1 .2) minutes 
to read, and needed 40.8 (15.6) minutes to draft a review. The 
reporting guidelines used were: STROBE (85 papers, 92%), 
CONSORT (17, 18%), TREND (14, 15%), STARD (nine, 10%), 
STREGA (two, 2%), and REMARK (one, 1%). For some papers, 
suggestions related to more than one reporting guideline: 
CONSORT, STROBE, and TREND (13 papers, 14%); STROBE 
and STARD (seven, 8%); STROBE and STREGA (two, 2%); 
and CONSORT and TREND (one, 1%). 

Most suggestions followed the guideline wording very closely 
("STROBE 14: Please provide baseline characteristics of study 
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Box: Example of review based on reporting guidelines 

In accordance with the STROBE statement applied to cross sectional studies and with the aim of increasing transparency and clarity 
(www.strobe-statement.org/Checklist.html), authors should consider the following modifications: 

Strobe 3: Please specify secondary objectives. Of the large number of relationships you have studied indicate, where appropriate, which 
of them had previous hypotheses. Otherwise, comment on this in the discussion — for example, whether or not these results should be 
considered as exploratory. 

Strobe 5: Please indicate data collection dates. 

Strobe 1 0: Please specify the reasons for collecting a specific sample size. If it was previously stated, please provide the rationale as 
well as either power or accuracy. 

Strobe 1 1 : Please explain the rationale for quantitative outcome cut points and whether or not they were previously specified. 

Strobe 13: It is implicitly understood that all consecutive patients were selected and all of them agreed to participate. It is also understood 
that there are missing data for four deaths only. Please make these points explicit and detail any deviations. 

Strobe 1 6: Please provide 95% confidence intervals for the estimated proportion of the primary objective. If the normal, large sample 
approximation cannot be applied, consider exact binomial methods: confidence intervals are especially relevant for small samples. 

Strobe 22: If applicable, specify all the sponsors and their control over the publication of results. Clarify similarities and differences with 
previous studies. 



participants"), but some were more elaborate ("STROBE 12 
and 16: Numerical covariates employed for adjusting were 
categorised: please check whether you get the same results if 
you choose a different cut point, or whether you treat them as 
numerical"). The table IJ classifies the number of suggestions 
based on reporting guidelines for every paper section. 

Allocation 

Before randomisation, all manuscripts were given an ad hoc 
assessment by the senior statistician (EC) using a score ranging 
from 1 to 9, in order to give a global measure of report quality 
at baseline. With these scores, we were able to use a random 
minimisation algorithm to balance mean differences in the ad 
hoc score as well as differences in study type counts (that is, 
intervention, longitudinal, cross sectional, and other type), but 
not to equilibrate the overall number of manuscripts in both 
groups. The algorithm gave probabilities from 0.5 (in the case 
of indifferent allocation to one or another group) to 0.8 (if both 
minimisation factors indicated allocation to the same group). 

Allocation concealment 

The second editorial decision (after peer review and before 
randomisation) took place without committee members knowing 
which papers were allocated to receive the additional review 
(fig 1). At later editorial decisions, committee members saw the 
additional reviews of papers in the intervention group. 

Outcome assessment 

We obtained the baseline and final scores by using a manuscript 
quality assessment instrument, designed by Goodman and 
colleagues 24 and used in our previous trial (web table l). 20 The 
instrument uses a 5 point Likert scale from 1 (low) to 5 (high), 
and comprises 37 items that assess the quality of the research 
report — not the quality of the research itself. The first item refers 
to the overall quality of the manuscript (our primary outcome). 
Of the remaining 36 specific items, 28 (78%) refer directly to 
key items in reporting guidelines and eight (22%) refer to paper 
format and style. 

The secondary outcome was the average of all pertinent 
items — that is, after excluding specific items that did not apply 
to the current study. The evaluators were three junior statisticians 
(JC, BK, and LG) with experience in teaching scientific critical 
reading to health professionals. The evaluators first rated each 
paper individually but, because they were expected to raise 
different methodological concerns, they were allowed to know 
each other's opinions before reaching a consensus. If a 



consensus was not met, the final score was the average of the 
individual scores. 

Statistical analyses 

Main hypotheses 

The main statistical analysis specified in the protocol compared 
papers according to their initial allocation ("as allocated" 
comparison), adjusted for baseline quality and study type using 
an analysis of covariance. We did a secondary "as reviewed" 
comparison (that is, comparing papers according to the reviews 
they actually received, irrespective of initial allocation) to assess 
the effect of protocol deviations on the conclusion. We also 
compared the proportion of papers that improved from the 
baseline. The reliability of individual ratings was assessed with 
the intraclass correlation coefficient. 

Post hoc hypothesis 

A statistical researcher (LC) classified, pooled, and masked 
reviewer suggestions. Two junior statisticians (BK, LG) then 
rated each suggestion's relevance and the authors' adherence 
in the final manuscript version to each suggestion using two 
Likert scales (web table 2). Because manuscripts had a different 
number of suggestions, the author's adherence to suggestions 
was averaged within the paper and compared between groups 
with a / test weighted by the root of the number of suggestions. 
We also did an equivalent weighted paired t test to compare the 
adherence to reviewer suggestions between conventional reviews 
and additional reviews in the intervention group. We fitted a 
mixed model with random effects accounting for both reviewer 
and author variability to analyse sensitivity to the statistical 
analysis. 

Sample size calculation indicated that 50 papers per group 
allowed 80% power for a 55% standardised difference between 
groups, in relation to the mean change in scores from the initial 
version to the final version of the paper. In the previous year 
(2007), the first editorial decision rejected 186 (57%) of 328 
received manuscripts and the second decision rejected 24 (17%) 
of the remaining 142; therefore, 118 papers were sent to authors 
with reviewer suggestions. Consequently, we defined an entire 
year as the recruitment period. For this study, both authors and 
referees were informed that their material could be used to assess 
the quality of the editorial process. 
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Results 
Flow 

From May 2008 to April 2009, 126 consecutive original papers 
were included in the study (fig 1). Of these papers, 34 (27%) 
were rejected on the basis of the conventional review, resulting 
in 92 randomised papers. Study types of the included 
manuscripts were: 16 (17%) intervention studies, mainly 
before-after studies with only five randomised trials; 38 (41%) 
longitudinal studies; 26 (28%) cross sectional studies; and 12 
(13%) studies of other types (mainly diagnostic studies). We 
saw protocol deviations in four papers in the conventional 
review group, which underwent an additional review based on 
reporting guidelines before the scheduled date (that is, cross-in 
manuscripts). 

Outcome reliability and validity assessment 

Individual ratings before the consensus discussion shared 0.46 
of common information (using the intraclass correlation 
coefficient). While rating the updated manuscripts, masked 
evaluators guessed the allocated group in 62% (56/90) of papers 
(95% confidence interval 51% to 72%; individual percentages 
of success 56% (50/90), 57% (51/90), and 68% (61/90). Overall, 
when looking at the author's adherence to any reviewer 
suggestion, the evaluators were also able to guess the 
suggestion's origin (that is, from a conventional review or an 
additional review) in 56% (n=855, 95% confidence interval 
54% to 59%) of the 1521 suggestions. Evaluators also guessed 
whether the type of review was from a clinician or statistician 
in 63% (n=961, 61% to 66%) of the suggestions. 

Selected papers 

According to the ad hoc 1 to 9 scale for baseline quality, the 34 
papers rejected after the second editorial decision (fig 1) had 
lower mean score than the 92 accepted papers (3.68 (standard 
deviation 2.24) v 4.75 (2.20), difference 1.07 (95% confidence 
interval 0.19 to 1.95)). 

Baseline quality based on 1 to 5 Likert scale 

The groups receiving conventional reviews alone and additional 
reviews had similar mean Goodman scores overall at baseline 
(3.00 v 2.84, fig 211). Specific items in reporting guidelines with 
high initial scores were oversight (4.54), analysis of multiple 
measures (4.05), and organisation (4.04). The worst scoring 
items were masking (1.45), dropout analysis (1.86), and dropout 
description (1.92). Standard deviations of the specific items 
varied from 0.69 (oversight) to 1.72 (confidence intervals). 
Pooled standard deviations of the overall and average quality 
of papers were 1.01 and 0.50, respectively. 

Intervention effect 

Overall quality (primary outcome) was higher in papers 
receiving additional reviews than in those receiving conventional 
reviews alone (0.55 (standard deviation 0.83) v 0.27 (0.59); 
adjusted improvement 0.25 (95% confidence interval -0.05 to 
0.54); fig 3Jj); this difference in quality was significant in the 
"as reviewed" population (0.33, 0.03 to 0.63). We obtained 
almost identical results for the average quality (secondary 
outcome) of all valid items (as allocated comparison, 0.11 (-0.0 1 
to 0.22); as reviewed comparison, 0.15 (0.04 to 0.27)). A post 
hoc interaction test showed that additional reviews had an 
increased effect on quality of the 16 intervention studies (0.87, 
0.01 to 1.74; P=0.04; fig 3). 



More papers improved from baseline in overall quality in the 
additional review group than in the conventional review group 
(22 (43%) v eight (20%); relative risk 2.21, 95% confidence 
interval 1.10to4.44; difference 23.6%, 3.2% to 44.0%; number 
needed to treat 4.2, 2.3 to 31.2; fig 4|i). This effect increased if 
we incorporated the four manuscripts with protocol deviations 
into the "as reviewed" analysis (relative risk 3.36, 95% 
confidence interval 1.42 to 7.99). The estimated effect was fairly 
similar in papers that did or did not include a statistician in the 
conventional reviews (data not shown). 

Adherence to reviewer suggestions 

Since most papers were located above the diagonal line in fig 
5||, the graph showed that, in the intervention group, authors 
adhered more to suggestions from conventional reviews than 
to those from additional reviews based on reporting guidelines 
(3.08 (0.90) v 2.70 (0.80). The difference was significant in a 
weighted paired comparison of means (0.43, 95% confidence 
interval 0.19 to 0.67). The weighted correlation between both 
adherences was 0.46 (P=0.01), showing the consistency among 
authors to consider and to include both kinds of reviewer 
suggestions. The estimated weighted mean difference in the 
conventional group (3.37) was higher than that in the additional 
review group (3. 14), although the between group difference was 
not significant (0.23; -0.17 to 0.63). The mixed model sensitivity 
analysis showed similar results. 

Discussion 
Summary of findings 

Our data indicated that specific reviewer recommendations to 
authors in order to fulfil reporting guidelines boosted the number 
of improved papers from 20% to 43% during the peer review 
process. But a high proportion of papers (45 (88%), fig 4) did 
not reach the maximum quality of reporting, based on Goodman 
score assessments by junior statisticians. Furthermore, although 
the postulated effect on the primary outcome corresponded to 
an average improvement of 0.40 on the Likert scale (that is, a 
1 point improvement in two of five manuscripts), the observed 
improvement was only 0.25 (that is, a 1 point improvement in 
one of four manuscripts). Therefore, although we had secondary 
evidence of effect, its size was too small to consider that our 
intervention reached its objectives. 

The finding that authors adhered more to suggestions from 
conventional reviews than to those from additional reviews 
invites several interpretations. Firstly, authors might not be 
aware of reporting guidelines in the previous phases of the study, 
which might make it difficult to incorporate their 
recommendations in the report. Secondly, authors might prefer 
to concentrate their efforts on more conventional suggestions. 
Thirdly, authors in the additional review group might have to 
cope with a much higher number of suggestions than authors 
in the conventional review group, and could automatically 
adhere less to any recommendation, which was supported by 
the observed trend towards lower adherence to conventional 
suggestions in the additional review group than that in the 
conventional review group. 

Strengths and limitations of the study 

We included any manuscript conditionally accepted after peer 
review within the scheduled time window. By following 
published guidelines, we also facilitated the definition of our 
intervention and its future replication. However, our study had 
several limitations. Firstly, the estimated effect could be 
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interpreted as mainly a STROBE effect, since this reporting 
guideline was used in the vast majority of assessments. 
However, the observed effect in this type of study was 
significantly reduced (fig 3). The increased effect in the 16 
intervention studies (fig 3) could be attributed to the longer 
tradition of intervention study guidelines. 

Secondly, since our trial was conducted in one journal and the 
intervention relied on only one statistician, external validity was 
limited. Thirdly, we designed a masked assessment method, but 
observed that evaluators were perhaps not completely blinded. 
If they subconsciously rated manuscripts higher in the additional 
review group, the true intervention effect would have been 
slightly smaller. Fourthly, we had four protocol deviations; this 
number is fairly low if we consider the complexity of the 
process, but high enough to complicate the interpretation of the 
results. 

Furthermore, the second editorial decision meeting had a higher 
rejection rate (27%) than the previous year (17%), resulting in 
only 92 randomised papers and a slightly lower power than that 
designed for 100 papers. Finally, although the Goodman score 
was especially developed to measure quality improvement 
during the peer review process, it has not been updated since 
the development of reporting guidelines, and its validity has not 
been formally assessed — probably owing to the absence of a 
gold standard. As Friedberg recently highlighted, "developing 
useful instruments to measure manuscript quality remains a 
huge challenge." 25 Ultimately, paper quality is a surrogate for 
the true purpose of research: to have clarity, transparency, 
reproducibility, and impact, both on healthcare and on scientific 
research. 

Conclusions and policy implications 

As mentioned previously, very few randomised trials have 
assessed interventions to improve manuscript quality after peer 
review. Health research is responsible for improvements in 
healthcare. But before implementing its results, the last and 
essential step is communication and dissemination, which relies 
on the peer review process. If we want to further improve our 
health system, we should develop and select efficient 
interventions and accurate prognosis and diagnosis tools. To 
avoid poor health research, 2 3 we need competent communication 
and editorial processes, with transparent publications and a low 
false discovery rate. 26 " 29 Although reporting guidelines are a 
profoundly reasonable procedure for boosting paper quality, 
reasonability does not imply effect. 2 " 

Authors in a mid-level medical journal have more difficulties 
in following suggestions based on reporting guidelines than 
those from conventional reviews alone. If authors do not 
consider key methodological features at the design and execution 
phases of their study, they will have difficulties in improving 
the paper at the later scientific phases. To boost paper quality 
and impact, authors should be aware of future requirements of 
reporting guidelines at the very beginning of their study, and 
peer reviewers should be made aware of the importance of 
transparent reporting and receive training if needed. 
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What is already known on this topic 

Manuscript quality in biomedical journals is far from guaranteed, despite the continued use of peer review after submission 
Reporting guidelines have been developed to improve manuscript quality and transparency 

What this study adds 

Additional reviews based on reporting guidelines resulted in a moderate improvement in manuscript quality 

Authors have difficulties in adhering to high standards of reporting during the writing phase; awareness of guidelines should be guaranteed 
during the design and execution of the study 
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Table 



Table 1| Total number of suggestions based on reporting guidelines 


Common and 
specific items 


STROBE (N=85) 


CONSORT plus 
extensions (N=17)* 


TREND (N=14) 


STARD (N=9) 


Total 
(N=92)t 




Item 


n (%) 


Item n (%) 


Item n (%) 


Item 


n (%) 


n (%) 


Title and abstract 




1 


53 (62) 


1 9 (53) 


1 1 1 (79) 


1 


1 (11) 


76 (72) 




1a 


0 














1b 


2(2) 












Introduction, 
background 




2 


13 (15) 


2 1 (6) 


2 0 






14 (14) 


Methods 


Participants and 
recruitment 


5 


47 (53) 


3 1 1 (65) 


3 16(100) 


3 


5 (56) 


138 (75) 




6 


49 (54) 






4 


2 (22) 






6a 


4(5) 














6b 


0 












Objectives 


3 


49 (58) 


5 7(41) 


5 7 (50) 


2 


3 (33) 


66 (63) 


Variables, 
measurements 


Interventions 






4 12(71) 


4 8 (57) 






20 (13) 


Standard, 
outcomes 


7 


36 (40) 


6 1 1 (59) 


6 6 (36) 


7 


2 (22) 


96 (67) 




8 


31 (35) 




10 0 


8 


0 














9 


1 (11) 














10 


9 (100) 




Sample size 


10 


81 (95) 


7 14(82) 


7 12(86) 






108 (92) 


Bias, 

randomisation, 
study design 


4 


26 (29) 


8 3(18) 


8 2(14) 


5 


4(33) 


91 (64) 




9 


44 (51) 


9 5 (29) 




6 


2 (22) 




10 4(24) 


Masking 






11 10(35) 


9 3(14) 


11 


8 (89) 


21 (15) 


Statistical methods 


11 


32 (38) 


12 29 (94) 


11 18(93) 


12 


11 (78) 


197 (86) 




12 


77 (60) 






13 


9 (100) 






12a 


7(8) 














12b 


4(4) 














12e 


6(6) 












Missing data 


12c 


25 (30) 


16 9 (47) 




22 


1 (11) 


53 (38) 




12d 


12 (14) 














14b 


4(5) 












Funding 


22 


81 (95) 










81 (88) 


Results 


Participant flow 


13 


81 (77) 


13 13(59) 


12 6(43) 


16 


2 (22) 


103 (78) 


Recruitment 


14c 


18 (20) 


14 5(29) 


13 1 (7) 


14 


2 (22) 


30 (26) 












17 


3 (33) 




Baseline data 


14 


23 (26) 


15 4(24) 


14 0 


15 


1 (11) 


34 (32) 




14a 


3(4) 




15 1 (7) 









Numbers analysed 15 20(24) 16 9(47) 16 1 (7) 18 1 (11) 31 (30) 
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(continued) 



Common and 
specific items 


STROBE (N=85) 


CONSORT plus 
extensions (N=17)* 


TREND (N=14) 


STARD (N=9) 


Total 
(N=92)f 




Item 


n (%) 


Item 


n (%) 


Item 


n (%) 


Item 


n (%) 


n (%) 


Outcomes and 
estimation 


16 


123 (89) 


17 


25 (88) 


17 


19 (79) 


19 


2 (22) 


188 (94) 




16a 


1 (1) 










21 


4 (44) 






16b 


0 










23 


0 






16c 


1 (1) 










24 


9 (100) 




Ancillary analyses 


17 


8(8) 


18 


5(29) 


18 


1 (7) 






14 (12) 


Adverse events 






19 


14 (82) 


19 


10(71) 


20 


7(78) 


31 (23) 


Discussion 


Interpretation 


18 


65 (58) 


20 


20 (88) 


20 


4(21) 






170 (80) 




19 


80 (71) 
















Generalisability 


21 


47 (46) 


21 


2(12) 


21 


0 


25 


3 (33) 


52 (48) 


Overall evidence 


20 


83 (68) 


22 


3(18) 


22 


0 






86 (66) 


Total (100%) 




1236 




225 




126 




92 


1700 



N=total number of manuscripts; n=number of times each reporting guideline item was used (since each item might have more than one suggestion, n can be 
greater than N); %=manuscripts with at least one suggestion divided by total number of manuscripts (N). 
"Includes CONSORT 2001 and CONSORT for non-pharmacological treatment interventions. 

tTwo further reporting guidelines were used sporadically: STREGA in two manuscripts with suggestions about participants (three), statistical methods (two), 
baseline data (two) and outcomes (two) ; and REMARK in one manuscript with suggestions about participants (one), sample size (one), study design (one), statistical 
methods (four), participant flow (one), recruitment (one), outcomes (two), and interpretation (one). 
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Figures 



Manuscripts (n=343) 
J 

Editorial decision 1 



F 



Not enough priority (n=217) 



Additional review 
based on reporting 

guidelines 

(n=126)* 



Standard peer 
review process 
(n=126) 



Editorial decision 2 



Measurement of 
baseline 
minimisation 
variables (n=126)* 



— »■ Rejected (n-34) 
Random allocation (n=92) 



Conventional reviews + 
additional review sent 
to authors (n=51) 

1 



Conventional reviews only 
sent to authors (n=41) 



Author improvements to manuscript based on reviews 
J 

Editorial decision 3 (n=92) 

Fig 1 Study design and manuscript flow. 'Additional reviews and measurement of minimisation variables were undertaken 
during the standard peer review process, but this information was concealed until the later editorial stages 



No of papers/mean score 



Item 


Conventional 
review group 


Additional 
review group 




Overall quality 


41/3.00 


51/2.84 




Average quality 


41/3.40 


51/3.32 




Oversight 


41/4.51 


51/4.57 


1 


Analysis multiple measures 


20/4.10 


24/4.00 


2 


Organisation 


41/3.98 


51/4.10 


3 


Aims 


41/4.22 


51/3.86 


4 


Background 


41/3.90 


51/4.02 


5 


Major variables 


41/3.98 


51/3.90 


6 


Style 


41/3.95 


51/3.92 


7 


Sample description 


41/3.98 


51/3.82 


8 


Eligibility 


41/3.85 


51/3.76 


9 


Other supporting 


41/3.95 


51/3.67 


10 


Design 


41/3.71 


51/3.63 


11 


Balanced reporting 


41/3.71 


51/3.55 


12 


Concision 


41/3.66 


51/3.57 


13 


Setting and source 


41/3.51 


51/3.63 


14 


Strength 


41/3.56 


51/3.57 


15 


Abstract 


40/3.60 


51/3.51 


16 


Denominators 


38/4.00 


50/3.12 


17 


Title 


41/3.46 


51/3.47 


18 


Quantitative methods 


41/3.46 


51/3.43 


19 


Clear reporting 


41/3.46 


51/3.24 


20 


Suitability comparisons 


13/3.38 


18/3.28 


21 


Figures and tables 


41/3.17 


51/3.24 


22 


Lost to follow-up 


17/3.00 


20/3.25 


23 


New knowledge 


41/2.95 


51/3.12 


24 


Subgroup effects 


39/2.56 


49/3.02 


25 


Confidence intervals 


39/2.95 


51/2.63 


26 


Effect size 


37/3.03 


47/2.47 


27 


Diagnostic test 


2/2.50 


4/2.75 


28 


Report multiple measures 


20/2.60 


25/2.48 


29 


Power 


40/2.52 


51/2.51 


30 


Limitations 


41/2.54 


51/2.45 


31 


Side effects 


7/2.86 


4/1.75 


32 


Generalising 


41/1.95 


50/2.30 


33 


Dropouts descriptions 


24/2.33 


25/1.52 


34 


Dropouts analysis 


21/1.81 


21/1.90 


35 


Masking 


6/1.17 


14/1.57 


36 



Poorest quality 
1 
L 



Goodman score 

Highest quality 




I 

20 



I 

40 



I 

60 



I 

80 



I 

100 



Proportion of scores (%) 
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Fig 2 Goodman quality scores at baseline. Bars contain proportion of scores from 1 (dark shade) to 5 (light shade), with 
cumulative percentages shown in the bottom scale. Gradation colour for the average quality was consequently adapted 
(break points are 2.5, 3.0, 3.5, and 4.0). Dots represent total mean for each specific item 



Main and secondary outcomes 
Outcome 

Overall quality as allocated 
Overall quality as reviewed 
Average quality as allocated 
Average quality as reviewed 
Overall quality as allocated by study type 
Intervention study (n=16) 
Longitudinal study (n=38) 
Cross sectional study (n=26) 
Othertype (n=12) 



Mean difference 
(95% CI) 



Mean difference 
(95% CI) 



0.25 (-0.05 to 0.54) 
0.33 (0.03 to 0.63) 
0.11 (-0.01 to 0.22) 
0.15 (0.04 to 0.27) 

0.87 (0.01 to 1.74) 
0.19 (-0.29 to 0.67) 
0.08 (-0.43 to 0.58) 
-0.15 (-1.10 to 0.81) 



-1.0 -0.5 0 0.5 1.0 

Worsening effect Improvement 

Fig 3 Effect of additional reviews on overall and average quality in "as allocated" and "as reviewed" populations, and primary 
analysis stratified by study type 



Conventional review group Additional review group 



1 


2 










2 


2 


3 






2 




6+1 


1+2 


0+1 








3 


4 


1 


1 


3 






11 


2 


2 






2 


10 


8 


2 


4 








11 












10 


1 


5 










2 












2 




1 


2 


3 


4 


5 


1 


2 


3 


4 


5 



Final Goodman score Final Goodman score 



■ Worse □ Equal □ Improve by 1 point □ Improve by 2 points □ Improve by 3 points 



Conventional 
review group 








Additional 
review group 


\ 








/ 



I 1 1 1 1 1 

0 20 40 60 80 100 

Proportion (%) 

Fig 4 Baseline and final Goodman quality scores in allocated groups. Numbers after the plus signs indicate the four 
manuscripts with protocol deviations 
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Adherence to suggestions from additional review 

Fig 5 Average author adherence to repeated reviewer suggestions based on 5 point Likert scale (1 =minimum, 5=maximum). 
Rectangles represent the 51 papers from the additional review group, with paired suggestions both from conventional 
reviews (shown as the horizontal lines on each rectangle) and additional reviews (shown as the vertical lines of each 
rectangle). The side length of rectangles represents the amount of information from any type of review (square root of the 
number of suggestions per manuscript) and the rectangle area represents each paper's overall information. A rectangle 
above the diagonal line indicates that a paper adhered more to the conventional review than to the additional review. For 
example, the asterisked rectangle corresponds to a manuscript receiving 14 suggestions (proportional to the square of the 
vertical sides) from the additional review with a 3.21 average level of adherence, and two suggestions (the square of the 
horizontal sides) from the conventional review with a mean adherence score of 5. Lines in the external margin represent 
papers from the conventional review group, and lines on the internal margin represent papers from the additional review 
group that received both conventional (lines along the vertical axis) and additional (lines along the horizontal axis) reviews; 
lines are repeated here to assist between group comparison. 
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